Skip to content

Conversation

@pramodsatya
Copy link
Contributor

@pramodsatya pramodsatya commented Oct 30, 2025

Description

To support constant folding and consistent semantics between the Presto Java coordinator and the Presto C++ worker, it is necessary to use consistent expression evaluation. To support this, a native expression evaluation endpoint, v1/expressions, has been added to the Presto C++ sidecar. This endpoint leverages the ExprOptimizer in Velox to optimize Presto expressions.

The optimized Velox core::TypedExpr returned by Velox's ExprOptimizer is converted to a Presto protocol::RowExpression in the Presto native sidecar with helper class VeloxToPrestoExprConverter. The end to end flow between the coordinator and sidecar is as follows:

RowExpression == (PrestoToVeloxExpr) ==> TypedExpr == (Velox ExprOptimizer optimize() API) ==> optimized TypedExpr == (VeloxToPrestoExprConverter) ==> optimized RowExpression

If an error is encountered during Velox to Presto expression conversion, it is logged for further analysis and the unoptimized input RowExpression is returned instead. With the fuzzer testing (see test plan), we expect this endpoint to be ready for production workloads.
When the OptimizerLevel is EVALUATED, the endpoint throws for any error encountered during expression evaluation.

Motivation and Context

The native-sidecar-plugin will implement the ExpressionOptimizer interface from Presto SPI to utilize the v1/expressions endpoint on the sidecar for optimizing Presto expressions using the native expression evaluation engine.

The primary goal is to achieve consistency between C++ and Java semantics for expression optimization. With this change, C++ functions including UDFs can be used for constant folding of expressions in the Presto planner.
Please refer to RFC-0006.

Test Plan

Tests have been added by abstracting testcases from TestRowExpressionInterpreter to an interface AbstractTestExpressionInterpreter. This test interface is implemented in TestNativeExpressionInterpreter to test the v1/expressions endpoint on the sidecar end to end.
Unit tests for simple expression conversions are also added in VeloxToPrestoExprConverter.cpp.

This feature is still in Beta, and to support production workloads with complete certainty, the Velox expression fuzzer will be extended to test this endpoint with fuzzer generated expressions in a follow-up PR. This should surface any remaining bugs.

Release Notes

Prestissimo (Native Execution) Changes
* Add http endpoint `v1/expressions` in sidecar for expression optimization.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Oct 30, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @pramodsatya, your pull request is larger than the review limit of 150000 diff characters

@pramodsatya pramodsatya force-pushed the expr_opt branch 2 times, most recently from d3f5a74 to 4224f36 Compare October 30, 2025 23:00
@pramodsatya pramodsatya marked this pull request as ready for review October 30, 2025 23:00
@prestodb-ci prestodb-ci requested review from a team and pratyakshsharma and removed request for a team October 30, 2025 23:00
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @pramodsatya, your pull request is larger than the review limit of 150000 diff characters

@pramodsatya
Copy link
Contributor Author

@aditi-pandit, @tdcmeehan, could you please take a look?

@pramodsatya pramodsatya force-pushed the expr_opt branch 2 times, most recently from 8034891 to 69c7c37 Compare November 5, 2025 18:37
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Have bunch of comments.

@pramodsatya
Copy link
Contributor Author

Thanks @aditi-pandit, made the changes as suggested. Could you PTAL?

constexpr char const* kOptimizerLevelHeader =
"X-Presto-Expression-Optimizer-Level";

velox::core::TypedExprPtr rewriteInExpression(
Copy link
Contributor

@aditi-pandit aditi-pandit Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pramodsatya : I'm confused about this. This is registered as a regular rewrite with the Velox optimization. So then its going to happen post Presto -> Velox expression conversion. If Presto -> Velox expression conversion can generate expressions with this IN expression format then how was Velox evaluating them in the past ? If it was missing this structural change then we can add it in Velox itself.. Conversely, if the Presto -> Velox expression conversion generates IN with subfield filters then we this optimization would never be triggered anyway. Don't follow the reasoning to have this in the Prestissimo layer.

Copy link
Contributor Author

@pramodsatya pramodsatya Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aditi-pandit, I added this here because the resolved Velox function was throwing an error for cases where IN-list is not constant. I do not see a similar error in regular execution path and this needs further analysis. Reverted this for now and will move the rewrite to Velox after fixing this error. 4 of the tests that validate this optimization have been disabled and will enable them once Velox rewrite is in.
Could you please take another look?

resultExpression = inputRowExpr;
}
protocol::to_json(j, resultExpression);
VLOG(1) << "Optimized RowExpression: " << j.dump();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log level 1 seems a bit too low for this detail?
If the resultExpression failed how would the user be helped with this? This assumes the saw the unoptimized version and ow can see it was either failing to optimize or couldn't be converted.
Basically, I'm not clear how useful this VLOG is other than seeing an expression in the log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VLOGs were added for ease of debugging in case there is an error during expression evaluation, removed them and added a LOG(ERROR) with info about expression that fails during evaluation, thanks.

[[fallthrough]];
case velox::core::ExprKind::kInput:
[[fallthrough]];
case velox::core::ExprKind::kLambda:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a default that issues a VELOX_USER_ERROR about an unsupported expression kind? Are there others that are not covered? Could this lead to false positives?

Copy link
Contributor Author

@pramodsatya pramodsatya Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, there are no other expression types and I avoided adding a default case here to ensure the conversion logic from TypedExpr to RowExpression will be explicitly added here for any additional ExprKinds that could be introduced in Velox in the future. If a new ExprKind is introduced and the default case returns a nullptr, the VeloxToPrestoExpr conversion could fail for any subexpression, since this function is called recursively by other internal functions of VeloxToPrestoExprConverter (eg: functions that convert TypedExpr to SpecialFormExpression).
If a VELOX_USER_ERROR is issued here when conversion fails, the endpoint will not return a response so chose to return a nullptr instead and return an unoptimized RowExpression from the endpoint, is that fine?

Also since the VeloxToPrestoExpr conversion is tightly tied to PrestoToVeloxExpr conversion, the e2e Presto expression tests and the Presto expression fuzzer (to be added in a follow-up) would help detect expression conversion failures that result from any changes to PrestoToVeloxExpr.
Alternatively, the input RowExpression can be passed as an argument to getRowExpression and the default case could be to just return the input RowExpression in case VeloxToPrestoExpr conversion fails.

Update: Modified VeloxToPrestoExprConverter::getRowExpression to take in an optional RowExpression parameter, which would be returned if the expression conversion fails. Please let me know if this is alright.

What do you suggest @czentgr @aditi-pandit ?

steveburnett
steveburnett previously approved these changes Nov 12, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants